Frequency Ratio: a method for dealing with missing values within nearest neighbour search
نویسندگان
چکیده
In this paper we introduce the Frequency Ratio (FR) method for dealing with missing values within nearest neighbour search. We test the FR method on known medical datasets from the UCI machine learning repository. We compare the accuracy of the FR method with five commonly used methods (three “imputation” and two “bypassing” methods) for dealing with values that are “missing completely at random” (MCAR) for the purpose of classification. We discovered that in most cases, the FR method outperforms the other methods. We conclude that the FR method is a strong addition to the commonly used methods for dealing with missing values within the nearest neighbour method.
منابع مشابه
An Analysis of Four Missing Data Treatment Methods for Supervised Learning
One relevant problem in data quality is the presence of missing data. Despite the frequent occurrence and the relevance of missing data problem, many Machine Learning algorithms handle missing data in a rather naive way. However, missing data treatment should be carefully thought, otherwise bias might be introduced into the knowledge induced. In this work we analyse the use of the k-nearest nei...
متن کاملA Study of K-Nearest Neighbour as an Imputation Method
Data quality is a major concern in Machine Learning and other correlated areas such as Knowledge Discovery from Databases (KDD). As most Machine Learning algorithms induce knowledge strictly from data, the quality of the knowledge extracted is largely determined by the quality of the underlying data. One relevant problem in data quality is the presence of missing data. Despite the frequent occu...
متن کاملP. Jönsson and C. Wohlin, "benchmarking K-nearest Neighbour Imputation with Homogeneous Likert Data", Empirical Software Engineering: an Benchmarking K-nearest Neighbour Imputation with Homogeneous Likert Data
Missing data are common in surveys regardless of research field, undermining statistical analyses and biasing results. One solution is to use an imputation method, which recovers missing data by estimating replacement values. Previously, we have evaluated the hot-deck k-Nearest Neighbour (kNN) method with Likert data in a software engineering context. In this paper, we extend the evaluation by ...
متن کاملFractal Approximate Nearest Neighbour Search in Log-Log Time
Nearest neighbour searches in the image plane are among the most frequent problems in a variety of computer vision and image processing tasks. They can be used to replace missing values in image filtering, or to group close objects in image segmentation, or to access neighbouring points of interest in feature extraction. In particular, we address two nearest neighbour problems: The nearest neig...
متن کاملComparison of imputation methods for missing laboratory data in medicine
OBJECTIVES Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models. DESIGN Retrospective cohort analysi...
متن کامل